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Abstract 

Aggregation of noisy observations involves a difficult tradeoff between observation quality, which 
can be increased by increasing the number of observations, and aggregation quality which decreases 
if the number of observations is too large. We clarify this behavior for a protypical system in which 
arbitrarily large numbers of observations exceeding the system capacity can be aggregated using 
lossy data compression. We show the existence of a scaling relation between the collective error and 
the system capacity, and show that large scale lossy aggregation can outperform lossless aggregation 
above a critical level of observation noise. Further, we show that universal results for scaling and 
critical value of noise which are independent of system capacity can be obtained by considering 
asymptotic behavior when the system capacity increases toward infinity. 
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This letter presents results which give a new perspective on the growing field of sensory 
data aggregation by clarifying fundamental principle, ot large-scale aggregation. Exanrple. 

of large scale aggregation of observations include astronomical observations [1'], biological 



sensing |2[ , early detection of natural disasters such as earthquakes, tidal waves and floods 
and wireless sensor networks Errors in observations can be reduced by collecting obser- 
vation data from more sensors. However, collecting data from many sensors usually involves 
some cost in terms of network resources, resulting in fundamental tradeoffs jsf. The the- 
oretical understanding of these tradeoffs in natural and engineered systems is now a high 
priority. 

An important fundamental problem in this field is the problem of aggregating independent 
observations of the same phenomenon with a resource constraint. Previous works have 
analyzed the tradeoff behavior between aggregate data rate and sensing error from the 
fundamental view of information theory. The analysis has been extended to include the 
situation where arbitrarily large numbers of samples can be collected by reducing the data 
aggregated from each sample using lossy data compression. However, so far results have 
only been obtained for the fundamental information theoretic bounds with infinitely many 
sensors {g, 7|, or specific situations in which the number of sensors is fixed [8,]. The previous 
works do not include the situation where the number of observations can be varied, and thus 
the results are not sufficient to support our understanding and design of real world systems. 

In this paper we introduce a modification of the common basic model for data aggregation 
with compression which makes it more tractable and amenable to analysis when the number 
of sensors can vary. Specifically, we consider independent decompression of each observation 
in a discrete version of the CEO problem j^. We show that this model reveals a new 
property, the existence of noise threshold beyond which large scale aggregation is superior 
to lossless aggregation with no compression. This can be seen as a manifestation of "more 
is different" in sensor networks Moreover, we show that universal results for scaling 
behavior of collective estimation error can be obtained by considering asymptotic behavior 
when the system capacity diverges to infinity. 

Suppose that we have L independent sensors which each independently observe an M-bit 
state X, Xf^ for /x = 1, ■ ■ ■ , M, of a common, uniform binary source, and obtain an M-bit 
observation Y{a) {a = 1, ■ ■ ■ , L) where each bit Y^{a) has common probability p of error, 
i.e. differing from the corresponding source bit X^. The value of p specifies the level of 
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observation noise. Now the sensors independently compress their M-bit observation into 
shorter A^-bit codewords, Z{a), and send them to the aggregator. The condition 'indepen- 
dent' excludes the possibility of mutual communications between sensors. We assume the 
rate R = N/M is common to all the sensors. In addition, we suppose that the sum total of 
the rate, the system capacity A, is fixed, with 

X = LR . (1) 

The aggregator then decodes every bit codeword independently to obtain L separate M- 
bit reproductions Y{a) {a = 1, ■ ■ ■ , L). Finally, the Y{a) are used to obtain a single collective 
estimator X. We analyze the behavior of the bit error probability, denoted pe{p,R', A), in 
the collective estimate. 

The theoretical lower bound of average distortion for a given rate R is given by the 
distortion-rate function, or simply the Shannon bound 1^. Though we know that the 
bound could be achieved asymptotically by using Shannon's random codes, the exponential 
encoding complexity prohibits us from using them in practice. For uniform binary sources, 
however, an alternative approach has been recently developed based on linear codes with 
iterative, or message passing, encoding achieving close to the theoretical limit [ll, 12, isl . 
Applying these new results allows us to obtain numerical results for arbitrary data reduction. 

FIG. [1] shows typical results from a numerical experiment for the average values of per-bit 



error probability, Pe{p, R] A), obtained using a linear code with an iterative encoder [ll|]. The 
linear codes are defined by a class of sparse matrices having K ones ('1') per row and C ones 
per column, respectively, where K/C = N/M. Therefore we may write R = K/C jl^. For 
ease of comparison, the values of error probability Pe{p, R] A) for noise p and rate R = N/M 
are divided by a reference level Pe{p, 1; A) for i? = 1 under the same system capacity A |l5| . 

The example FIG. [T] demonstrates the following two points; (1) There exists a threshold 
value of noise where lossy large-scale aggregation becomes superior to lossless aggregation. 
Lossless aggregation with R = 1 outperforms the lossy aggregation with R smaller than 1 
at lower noise levels. However, at higher noise levels the alternative strategy with lossy data 
compression becomes superior. (2) There exists a scaling relation with respect to system 
capacity. The error curves have a universal shape in the sense that plots for different A 
overlap with appropriate re-scaling, as shown by the example for A = 500 using the scale on 
the right side. This observation implies a scaling law for the data aggregation with respect 
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FIG. 1: Semilog plots for average error probability in noisy data aggregation using linear codes 
with K = 2. The values of error probability pe{p,R', A) for noise p and rate R = K/C are divided 
by a reference level Pe{p, 1; A) under the same system capacity A. Here parameters are chosen to 
be A = 500 with C = 3 (pluses, right scale) and A = 1000 with C = 3 and 12 (circles and squares, 
left scale), respectively. 



to A. Introducing the coefficient (3 we can write the empirical scaling relation as follows: 



log 



— plof 



(2) 



•Pe(p, 1;/5A) J Lpe(p, 1;A) 

Using the base-10 logarithm, the scaling in FIG. [1] is well defined by the scaling factor (3 = 2. 

In this letter, we present a theoretical analysis which explains these empirical results, and 
presents them in a universal form. First, we assume that the error due to lossy compression 
is independent of /x and a, and denoted by D, that is (5(y^(a), — F^(a))) = D. Here we used 
Kronecker's delta 5 and the braket denotes averaging over random variables. This includes 
the standard exchangeable sensor ansatz for our model 6|, l7| , which means that all sensors 
have the same rate R and distortion D. The possible value of the distortion D depends on R, 
so we explicitly denote D as D{R). The combined error probability for V"^(a), independent 
of /i and a, is obtained as 

p={l-2p)D{R)+p. (3) 

The combined error probability p is a function of both p and R. In particular, the equation 
([3]) implies that p is a decreasing function of i?, since D{R) should be a decreasing function 



of R. 

Since we assume Bernoulli statistics, the best estimate from the set of aggregated values 
can be obtained by the simple majority-vote operation: 



4 = sgn|X;f»| 



• a=l 

Then, the error probability for the final estimate is given, in terms of p and L, by Pe{Pi R] A) = 
^^L+i L), which is just the probability of getting more than L/2 errors out of L 
Bernoulli trials. We assume for simplicity that only odd values of L are taken. The Qp{l; L) = 
(■^)p'(l — p)^^^ represents the binomial distribution. 

It is obvious that Pe{p, R', A) is a decreasing function of L if p is fixed. However, due to the 
constraint ([1]), and the decrease in distortion D{R) with increase of R, p actually increases 
with an increase of L, resulting in contrary effects on Pf,{p,R; A). Therefore the challenge 
here is to incorporate consideration of the distortion D in a way which clarifies the interplay 
between the contrary effects induced by the constraint ([T]). 

In the following, we consider the asymptotic analysis in the limit of large A, for which 
we can obtain explicit results. For sufficiently large L, the binomial distribution Qp{l\L) 
is well approximated by the Gaussian distribution N{Lp, Lp{l — p)) with mean Lp and 
variance Lp{l — p). Now we examine the asymptotic behavior for A. Write a{p,R) = 
(1 — 2p)(l — 2D{R)) and define, for simplicity, 

a{p, R)VX 



^R{l-a{p,R)){l + a{p,R)) 
Then, in the limit A ^ oo, the asymptotic expansion of the cummulative Gaussian distri- 
bution gives 



Pe{p,L;X) ~ ^erfc(^-^) 



where erfc(a;) is the complimentary error function 16|. By analogy with large deviation 



theory 17|], we can define and calculate the exponential rate of decay as follows: 



Ip{R) = - lim ^\npe{p,R;X) 

A— »oo A 



(0<i?<l). (4) 



2R{l-a{p, R)){l + a{p, R)) 
Notice that the above formula holds for any function D{R). Indeed this universal property 
well describes the exponential scaling ([2]). 



In particular, the smallest average distortion D{R) is obtained in the limit of M — cxd, 

and is called the distortion-rate function ilm. In our model, its inverse function, the rate- 
Pi 

distortion function [IQ^, can be analytically given by 

R{D) = 1 + D\og,D + {1 - D)\og,{l - D) . (5) 

We may use either the distortion-rate function or the rate-distortion function to describe 
the optimal boundary, since the two descriptions are equivalent in the large M limit. 

Now assume hereafter that the distortion-rate function D{R) is the specific case implicitly 
given by the inverse formula (jS]) for R{D). Then asymptotics of R{D) enables us to obtain 
the large scale decay rate as 

/p(0) = - lim ^ limlnpe(p,i?; A) = {l-2pf\a2 . 

Now we can see that if we compare just the two aggregation strategies i? = 1 or i? — > 0, the 
threshold value of noise pi corresponding to the switch of the superior aggregation can be 
determined by solving the equation 

^ ^' 2(l-(l-2pi)2) 

The analytical solution pi = 0.236 gives the threshold beyond which the large scale aggre- 
gation with R outperforms the R = 1 strategy. 

Next we numerically examine the value of R which maximizes Ip{R) for a given p. The 
optimal value R* is plotted in FIG. [2] as a function of p. We find that the optimal rate 
vanishes, i.e. R* = 0, for noise levels larger than a critical point pq = 0.295. In contrast, 
we can always find non-zero optimal values of R below this point. In particular, if the noise 
level is near zero, then i? = 1 is optimal. The change in value of optimal R* with respect to 
noise level p is continuous at po, as in a second order phase transition. 

We note that the analytical results presented here using (jl]) and ([5]) are consistent with the 
results of the numerical simulations with linear codes. That is, the exponential rate of decay 
dl]) well describes the scaling law ([2]). Moreover, they add more specific and fundamental 
conditions to our first observation on FIG. [T] that aggregation with R smaller than 1 is 
superior for larger noise. The critical point beyond which the strategy with i? = 1 is not 
optimal in FIG. [2] indicates the lowest bound for such threshold, and is obviously consistent 
with the numerical simulations. 
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FIG. 2: Optimal rate R* for lossy aggregation of observations from independent sensors in noisy 
environment with noise level p. R* is the optimal value of R, the aggregation rate per sensor, which 
maximizes the asymptotic decay rate Ip{R) of error probability with increase of system capacity 
A. Ip{R) is defined in Distortion due to lossy compression is given implicitly by ([5|). For 
comparison, R^ is the pessimistic value of R which minimizes Ip{R). 

Now let us consider the value of R which minimizes Ip{R), say . In contrast with the 
continuous change in the behavior of optimal R*, the pessimistic R^ shows an abrupt change 
with respect to the noise p. Our numerical analysis indicates that there are only two cases 
for the worst solution: R^ = and i?^ = 1, so the threshold value of noise pi corresponds to 
the switch of the R^ . 

We note that in the intermediate range of p the optimal R* is a finite value between R = 1 
and R = 0. It is natural to ask how much the estimates obtained with these intermediate 
values of R* differ from the estimates obtained using the extreme values of -R = 1 or i? = 0. 
FIG. [3] shows the noise dependence of decay rates Ip{R) with R = 0, 1, and R*, respectively. 
The size of the difference Ip{R*) — Ip{^) and /p(0) — is shown in the inset of FIG. [31 
For comparison with these results which were obtained using the Shannon limit, the rate- 
distortion function in ([5]), we also show the result obtained for linear code with K = 2, 
corresponding to FIG. [H This result for K = 2 was obtained using the replica method for 
diluted spin systems First we note that in the case of compression using R{D), 

expression ([5]), the combination strategy of using only either i? = or i? — 0, switching at 
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the threshold point pi, well approximates the optimal performance given by R*. Next, we 
focus on the behavior of the difference Ip{R*) — -^p(l) with respect to the noise p (solid line 
in inset). The largest gain is achieved at p* = 0.305 (indicated in the figure by a vertical 
dotted line), which differs slightly from the value for po which was po = 0.295. Finally, we 
consider the result for the linear code with K = 2. It shows a similar threshold behavior 
- the value of Ip{R) for R = becomes greater than the value for i? = 1 when the noise 



p exceeds a threshold value 



20]. However, the gain is less than that obtained for the rate- 



distortion function, which shows that there is still room for improvement by using alternative 
techniques |2ll. |22|. 

Our results show that the optimal aggregation for a system of sensors with constrained 
system capacity exhibits a kind of threshold behavior with respect to the observation noise 
level. If we imagine the system autonomously switching to the optimal aggregation method, 
then it would appear to be a phase transition behavior. This result is significant for under- 
standing the principles of large scale aggregation in sensing systems, natural or engineered. 
We described the behavior of the optimal aggregation rate per sensor R = X/L, the ratio 
of the system capacity A and the number of sensors L. The analysis shows that in the high 
noise region beyond a critical value of noise, the rate R should approach to zero in order to 
reduce collective estimation error. This means that very many sensors with L ^ X should 
be used. In contrast, if the noise level is lower than the critical point, the ratio R should 
take a positive value. In this case, the number of sensors scales as L = 0(A). 

This work has been supported in part by Grant-in-Aid for Scientific Research on Priority 
Areas, Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan, 
No. 18079015. 
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